The following program generates an artificial sample of size 50 from the standard normal and bootstraps the distribution of the sample mean and median. Note that the theoretical standard error of the sample median for the standard normal of size n is Ö (0.5p / n) which is approximately 0.177 for n = 50. By comparison, the standard error of the sample mean is Ö(1 / n) » 0.141.
' bootstrap sample mean and median ( 11/2/99 h)' series proc version (slower than matrix version)' last revised 3/07/2007 ' set size of sample!n = 50 ' create workfilewfcreate bootmed1 u 1 !n ' set seed of random number generatorrndseed 456 ' generate pseudo-draws from a standard normalseries x = nrnd ' set number of bootstrap replications!reps = 10000 ' store means and medians in matrixmatrix(!reps,2) out ' bootstrap loopticfor !i = 1 to !reps ' generate bootstrap sample x.resample x_b ' store bootstrap median out(!i,1) = @median(x_b) ' store bootstrap mean out(!i,2) = @mean(x_b)nextscalar elp = @toc ' display descriptive statisticsshow out.stats ' theoretical standard error of median from normal' se(med) = 1 / ( 2*m*(obs)^0.5 )' where m is the median value' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)!pi = @acos(-1)scalar se_med = @sqrt(0.5*!pi/!n)scalar se_mea = @sqrt(1/!n) ' convert bootstrap sample into seriesexpand 1 !repssmpl 1 !reps series out_mediseries out_meangroup gout out_medi out_meanmtos(out, gout) ' display results in tabletable tabsetcolwidth(tab,1,20) tab(1,2) = "median"tab(1,3) = "mean" setline(tab,2) tab(3,1) = "bootstrap mean:"tab(3,2) = @mean(out_medi)tab(3,3) = @mean(out_mean) tab(4,1) = "bootstrap s.d.:"tab(4,2) = @stdev(out_medi)tab(4,3) = @stdev(out_mean) tab(5,1) = "theoretical s.d.:"tab(5,2) = se_medtab(5,3) = se_mea setline(tab,6) tab(7,1) = "sample size = "tab(7,2) = @str(!n) tab(8,1) = "bootstrap replications = " tab(8,2) = @str(!reps) tab(9,1) = "elapsed time = " tab(9,2) = @str(elp) + " seconds"show tab
The following is the matrix function version of the above program that produces identical results, but takes less time to process.
' bootstrap sample mean and median' matrix function version (faster than series version)'last revised 3/07/2007 ' create workfilewfcreate bootmedian2 u 1 1 ' set seed of random number generatorrndseed 456 ' set size of sample!n = 50 ' generate pseudo-draws from a standard normalmatrix(!n,1) xnrnd(x) ' set number of bootstrap replications!reps = 10000 ' store means and medians in vectorsvector(!reps,1) out_medivector(!reps,1) out_mean ' declare matrix to hold bootstrap samplematrix x_b ' bootstrap loop' reset timerticfor !i = 1 to !reps ' generate bootstrap sample x_b = @resample(x) ' store bootstrap median out_medi(!i) = @median(x_b) ' store bootstrap mean out_mean(!i) = @mean(x_b)next' store elapsed time in secondsscalar elp = @toc ' theoretical standard error of median from normal' se(med) = 1 / ( 2*m*(obs)^0.5 )' where m is the median value' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)!pi = @acos(-1)scalar se_med = @sqrt(0.5*!pi/!n)scalar se_mea = @sqrt(1/!n) ' display results in tabletable tabsetcolwidth(tab,1,20) tab(1,2) = "median"tab(1,3) = "mean" setline(tab,2) tab(3,1) = "bootstrap mean:"tab(3,2) = @mean(out_medi)tab(3,3) = @mean(out_mean) tab(4,1) = "bootstrap s.d.:"tab(4,2) = @stdev(out_medi)tab(4,3) = @stdev(out_mean) tab(5,1) = "theoretical s.d.:"tab(5,2) = se_medtab(5,3) = se_mea setline(tab,6) tab(7,1) = "sample size = "tab(7,2) = @str(!n) tab(8,1) = "bootstrap replications = " tab(8,2) = @str(!reps) tab(9,1) = "elapsed time = " tab(9,2) = @str(elp) + " seconds"show tab
The following program replicates the permutation test in Johnston-DiNardo (1997, 11.3). The hypothesis under test is whether there is a significant difference in the sample of employment changes between two states. The program constructs a nonparameteric 95% confidence interval of the difference in means using 1000 permutation samples.
' two-sample permutation test ( 11/3/99 )' (Johnston-DiNardo 11.2)' matrix permutation function (fast)' checked 3/6/2007 ' create workfilewfcreate cardkrug u 1 1 ' input data in matrix (first 33 rows from N.J.)matrix(40) datadata.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5 ' extract submatrices for each samplematrix sub0 = @subextract(data,1,1,33,1)matrix sub1 = @subextract(data,34,1) ' compute difference in means for actual samplescalar mdiff = @mean(sub0) - @mean(sub1) ' set number of replications!reps = 1000 ' set seed of random number generatorrndseed 123456 ' declare storage matricesmatrix pdata ' permuted datavector(!reps) pdiff ' difference in means from pdata ' permutation loopfor !i=1 to !reps ' permute data pdata = @permute(data) ' extract submatrices from permuted data sub0 = @subextract(pdata,1,1,33,1) sub1 = @subextract(pdata,34,1) ' store difference in means pdiff(!i) = @mean(sub0) - @mean(sub1) next ' 95% interval (percentile method)scalar lower = @quantile(pdiff,0.025) scalar upper = @quantile(pdiff,0.975) ' display output in tabletable outsetcolwidth(out,1,30)out(1,1) = "Sample difference in means:"out(1,2) = mdiff out(2,1) = "Permutation confidence interval:"out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" out(3,1) = "Number of permutations:"out(3,2) = @str(!reps)show out
For reference, we also provide a program that uses the group resample procedure. This program also computes the t-statistic for testing the difference of means.
' two-sample t-test ( 11/3/99 )' (Johnston-DiNardo 11.2)' group resample function (slow)' checked 3/6/2007 ' create workfilewfcreate cardkrug u 1 40 ' input data in matrix' first 33 from N.J.series xx.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5 ' create dummy for each subsampleseries dum = 0smpl 1 33dum = 1smpl @all '----------------------------------------------------------------------------' parametric t-test'---------------------------------------------------------------------------- freeze(ttest) x.testby(mean) dumshow ttest '----------------------------------------------------------------------------' non-parametric permutation test'---------------------------------------------------------------------------- ' compute difference in means for actual sampleseries mdiff = @mean(x,"@all if dum=1") - @mean(x,"@all if dum=0")scalar dmean = @elem(mdiff,"1") ' set number of replications!reps = 1000 ' set seed of random number generatorrndseed 123456 ' declare storage matricesmatrix pdata ' permuted datavector(!reps) pdiff ' difference in means from pdata ' permutation loopfor !i=1 to !reps ' permute data x.resample(permute) x_b ' compute difference in means mdiff = @mean(x_b,"@all if dum=1") - @mean(x_b,"@all if dum=0") ' store difference in means pdiff(!i) = @elem(mdiff,"1") next ' 95% interval (percentile method)scalar lower = @quantile(pdiff,0.025) scalar upper = @quantile(pdiff,0.975) ' display output in tabletable outsetcolwidth(out,1,30)out(1,1) = "Sample difference in means:"out(1,2) = dmean out(2,1) = "Permutation confidence interval:"out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" out(3,1) = "Number of permutations:"out(3,2) = @str(!reps)show out